Clustering is one of the rudimentary predicaments in research area. k-means is one of the popular partitional clustering algorithm. k-means plays a crucial role in selecting the initial centers and proper initialization has to be done to obtain an ideal solution. To solve this problem k-means++ is propounded which sequentially chooses the centers and thus we can obtain an optimal solution. The scalability of k-means++ is poor as the data size increases so the k-means++ becomes inefficient with an additional overhead of choosing the centers which causes repetitive distance computations and can be overlapping. To improve scalability and efficiency, this paper presents MapReduce k-means++ with pruning method. The k-means initialization algorithm is executed in Mapper phase and the weighted k-means++ initialization algorithm is run in reducer phase. Furthermore, to reduce the expensive distance computations so that there are no redundant clusters a pruning strategy is implemented which locates the clusters centers distinctly this is done on MapReduce. Experimentation is carried on synthetic and oxford dataset and the performance results indicates that the propounded MapReduce k-means++ with pruning is efficient.
Loading....